HW 02

Author

Nathan Herling

Published

June 13, 2025

0 - Setup

[FYI]
'pacman' already installed — skipping install.
[FYI]
'dsbox' already installed — skipping GitHub install.
The packages loaded:
* tidyverse           * glue                * scales              * lubridate            
* patchwork           * ggh4x               * ggrepel             * openintro            
* ggridges            * dsbox               * janitor             * here                 
* knitr               * ggthemes            * ggplot2             * kableExtra           
* palmerpenguins      * grid                * htmltools           * plotly               

1 - A new day, a new plot, a new geom

Question #1

A new day, a new plot, a new geom. The goal of this exercise is to learn about a new type of plot (ridgeline plot) and to learn how to make it. Use the geom_density_ridges() function from the ggridges package to make a ridge plot of Airbnb review scores of Edinburgh neighborhoods. The neighborhoods should be ordered by their median review scores. The data can be found in the dsbox package, and it’s called edibnb. Also include an interpretation for your visualization. You should review feedback from your Homework 1 to make sure you capture anything you may have missed previously.

Data Analysis - Q1
Table 1. Diagnostic Summary for review_scores_rating (edibnb data set)
Metric Value
Data Type numeric
Min 20
1st Quartile 93
Median 97
Mean 95.0246657029274
3rd Quartile 99
Max 100
Missing Values 2177
IQR 6
Lower Outlier Bound 84
Upper Outlier Bound 108
Outlier Count 576


Interpretation
The graph (Distribution of Airbnb Review Scores by Edinburgh Neighborhood) displays the distribution of Airbnb review scores across Edinburgh neighborhoods using ridgeline plots, with each neighborhood’s mean score marked by a diamond (via double encoding). The mean review scores are generally high, ranging from about 93.9 to 95.9, on a scale of 0-100. Some neighborhoods, like Morningside and Bruntsfield, show slightly higher average scores. The variation in score spread highlights differences in review consistency between neighborhoods, making it easier to compare where listings tend to receive better feedback.

2 - Foreign Connected PACs

Question #2a


Make a graph: Contributions to US political parties from UK-connected PACs.

Data Analysis - Q2
Table 1. Diagnostic Summary for dems and repubs
Variable Metric Value
dems Data Type numeric
dems Min -9050
dems 1st Quartile 2000
dems Median 11000
dems Mean 35667.7589807853
dems 3rd Quartile 40000
dems Max 853223
dems Missing Values 0
dems IQR 38000
dems Lower Outlier Bound -55000
dems Upper Outlier Bound 97000
dems Outlier Count 245
repubs Data Type numeric
repubs Min -11000
repubs 1st Quartile 3000
repubs Median 18500
repubs Mean 50162.7623224729
repubs 3rd Quartile 58000
repubs Max 812500
repubs Missing Values 0
repubs IQR 55000
repubs Lower Outlier Bound -79500
repubs Upper Outlier Bound 140500
repubs Outlier Count 236

Question #2b


Make a graph: Contributions to US political parties from non-UK-connected PACs.
Let’s pick Switzerland.

3 - Median housing prices in the US

Question #3a


Re-create the graph: Median Housing Prices in the US - not seasonally adjusted

Data Analysis - Q3
Combined Diagnostic Summary for Median Housing and Recession Data
Dataset Metric Value
median_housing date - Data Type Date
median_housing price - Data Type numeric
median_housing date - Missing Values 0
median_housing price - Missing Values 0
median_housing price - Min 17800
median_housing price - 1st Quartile 49575
median_housing price - Median 124350
median_housing price - Mean 140386.752136752
median_housing price - 3rd Quartile 223350
median_housing price - Max 374900
median_housing date - Range Start 1963-01-01
median_housing date - Range End 2021-04-01
recessions start - Data Type Date
recessions end - Data Type Date
recessions start - Missing Values 0
recessions end - Missing Values 0
recessions start - Range Start 1857-06-01
recessions start - Range End 2020-02-01
recessions end - Range Start 1858-12-01
recessions end - Range End 2020-04-01

Question #3b


• Identify recessions that happened during the time frame of the median_housing dataset. Do this by adding a new variable to recessions that takes the value TRUE if the recession happened during this time frame and FALSE if not.
• Now recreate the following visualization. The shaded areas are recessions that happened during the time frame of the median_housing dataset. Hint: The shaded areas are “behind” the line.

3b-Note: Some recession rows were intentionally excluded for the purpose of the assignment.

Question #3c


• Create a subset of median_housing dataset from 2019 and 2020. Add two columns: year and quarter. year is the year of the date and the quarter takes the values of Q1, Q2, Q3, or Q4 based on date
• Re-create the visualization.

3c-Note: Some recession rows were intentionally excluded for the purpose of the assignment.

4 - Expect More. Plot More.

Question #4


Recreate the Target LOGO.
Describe steps..
(see code comments)
1. make inner dot
2. Make outer ring
3. Make ‘Target’ use ‘[R]’ symbol
4. Piece it all together.

5 - Mirror, mirror on the wall, who’s the ugliest of them all?

Question #5


Mirror, mirror on the wall, who’s the ugliest of them all? Make a plot of the variables in the penguins dataset from the palmerpenguins package. Your plot should use at least two variables, but more is fine too. First, make the plot using the default theme and color scales. Then, update the plot to be as ugly as possible. You will probably want to play around with theme options, colors, fonts, etc. The ultimate goal is the ugliest possible plot, and the sky is the limit!

Question #5


The ultimate goal is the ugliest possible plot, and the sky is the limit!
Snakes on a plan? No. Penguins on a Sphere!

Mapping Description:
Bill Length → θ (polar angle, latitude)
Flipper Length → φ (azimuthal angle, longitude)
• Radius is constant: r = 1
• Penguins are plotted on the surface of a unit sphere